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® Voice-operated service. 

© In a telephone banking service, initial log-on (before the service has identified the user) must use speaker- 
independent recognition (ie the speech recognizer 2 operates in accordance with word templates drawn from the 
population at large). The first digit of a sequence identifying the user is compared with certain templates of two 
sets of templates to determine which set most closely resembles the user's speech. The two sets of templates 
may comprise standard templates formed by male and female speakers respectively. 
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This application is a divisional application of EP patent application No. 89905458.9. 

The present invention relates to yoice-operated services, for example services (such as home banking) 
which may be offered on a telephone network. In particular, the invention concerns recognition of user 
identifiers which a user must enter in order to identify himself or herself to the apparatus operating the 
5 service. 

In accordance with the invention apparatus for operating a voice-operated "service comprises a speech 
recogniser operable to compare each word of a sequence of words supplied thereto with stored templates 
to obtain a measure of similarity, the templates comprising a first set containing first templates correspond- 
ing to each word of a set of words, and a second set containing second templates corresponding to each 
io word of the said set, and control means arranged in operation so to control the recogniser that: 

(a) the first word of the sequence is compared with templates of the first set of templates corresponding 
to certain predetermined words of the set and with templates of the second set of templates correspond- 
ing to other words of the set; 

(b) the measures are compared to determine which word of the set the said first word most closely 
is resembles; and 

(c) subsequent words of the sequence are compared with either the first or the second set of templates, 
in dependence on which word the said first word has been determined as resembling. 

Note that the word "template" is used to signify data representing the sound of a word, with which a 
comparison can be made; it is not Intended to imply any particular type of speech recognition. 

20 Preferably, the two sets of templates correspond to maie and female speech respectively. The set of 

words may comprise the digits 0 to 90 or some other suitable combination. 

The service may be an interactive voice-operated service accessible via a telephone line and. in a 
preferred embodiment of the invention, is a home banking service. 

In accordance with the invention there is also provided a method of operating an automated voice- 

?5 operated service on a telephone network, the method comprising determining the identity of a caller to this 
service through the use of a speech recogniser operable to compare each word of a sequence of words 
supplied thereto by the caller with stored templates to obtain a measure of similarity, the templates 
comprising a first set containing first templates corresponding to each word of a set of words, and a second 
set containing second templates corresponding to each word of the said set, and control means arrange an 

30 operation so to control the recogniser that: (a) the first word of the sequence is compared with templates of 
the first set of templates corresponding to certain pre-determined words of the set and with templates of the 
second set of templates corresponding to other words of the set; <b) the measures are compared to 
determine which word of the set the said first word most closely resembles; (c) subsequent words of the 
sequence are compared with either the first or the second set of templates, in dependence on which word 

35 the said first word has been determined as resembling; the identity of the caller being determined in 
accordance with the sequence of words which are recognised. 

Subsequent to the identity of the caller being determined, the identity of the caller may be checked by 
causing the caller to utter a personalised sequence of words, for example a personal identification number, 
the uttered words being compared- by a speech recogniser with previously stored templates specific to the 

40 caller identified as the result of the determination, the caller being given access to further aspects of the 
service only if the results of the speaker dependent recognition check of the personalised sequence of 
words are acceptable. 

The apparatus shown in the figure forms part of the control equipment of a telephone banking service. 
A telephone line 1 is connected to a speech recogniser 2 and a speech synthesizer 3. A controller 4 (e.g. a 
45 microprocessor system or other digital computer) can transfer commands to the synthesizer 3 to cause 
desired words to be uttered and receives data output from the recogniser 2. 

The recogniser, of conventional design, compares words received over the telephone line with a set of 
templates and produces at its data output 

(i) a code W1 indicating the template which most closely matches 
50 (ii) a code W2 indicating the next closest template 

(iii) scores D1, D2 indicating the goodness of fit between the word and those two templates. 
In order to use the service, an initial log-on procedure requires the user to identify himself or herself by 
uttering a plurality of words, preferably a series of digits, though, other words could of course be used. A 
five digit log-on number is hereafter assumed. The resulting codes Wt are received by the controller 4 and 
55 thus identify the user. Further steps (with which we are not here concerned) then follow; for example the 
checking of a personal identity number (PIN) - for which of course speaker dependent recognition (i.e. using 
templates specific to he particular user) may now be employed. Speaker independent recognition of course 
must necessarily be used to recognise the log-on number. 
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The aspect of the system with which we are here particularly concerned is the allocation of the log-on 
number for a new user, the object being to avoid digits which represent a high probability of maze- 
recognition during subsequent log -on procedures. 

The user is asked to utter the words one. two. three, four, five, six. seven, eight and nine. These voice 
5 samples along with any other desired words, may be recorded for the formation of templates for speaker- 
dependent recognition. For present purposes, however, they are passed to the speech recogniser 2. which 
produces, for each word, the codes WI and W2, and scores D1. D2 as indicated in table 1 below 

r ' Table 1 
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one two three n: he 




/5 



20 



25 



WI 


1 


3 




W2 


4 


2 


4 


Dl 


18 


34 


2 1 


D2 


25 


29 


42 



The codes shown for W1. W2 are simply the actual numbers which are deemed by the recogniser to 
have been recognised. The scores are numbers, on an arbitrary scale, indicating the extent of deviation of 
the received word from the template; thus a low score indicates a good match whilst a high score indicates 
a poor match, The scores given are assumed to be actual results for a particular user. Looking at the 
example given in Table 1, it can be seen that the words, "one" and "three" have been correctly recognised, 
whilst the word "two" has been mistaken for "three". 

The controller 4, having received this data applies certain criteria to ascertain which digits (if any) offer 
such poor reliability of recognition that they should be excluded from the log-on number. For the example 
30 user of Table 1, the word two is rejected since mis-recognition has actually occurred. However- the 
remaining digits are also examined. For reliable recognition, what is important is not that the score obtained 
should be low. but that the score obtained with the "correct" template is significantly better than with the 
second choice template. Thus in the example the word "three" has scores of 21 and 42 and will therefore 
be more reliably recognised than "one" where the scores, though lower, are similar and mis- recognition 
35 may occur. Thus, in this embodiment, the criterion applied is that (D2-D1) is greater than some threshold T. 

The controller 4. computes the difference (D2-D1) for each digit, compares the differences with the 
threshold T and rejects any digits which do not meet the criterion. It then allocates to the user a five-digit 
log-on number none of whose digits is one which has failed to meet the criterion. Clearly, if more 
acceptable digits are available than are needed, it is preferable to use those with the larger differences. This 
40 allocation can be in accordance with any convenient procedure - eg. selection of the next unused number 
which does not contain a rejected digit. 
' The threshold T can be selected on the basis of experience at a level falling between one at which 
digits are rarely rejected and one at which many digits are consistently rejected. With a practical selection 
for T it may occur, with users having particularly poor diction, that eight or even nine digits are rejected. 
45 Therefore the controller may reduce the threshold T for that particular user down to a level such that a 
desired minimum number of digits meet the criterion - for example, it may be decided that a five-digit log- 
on number should contain at least three mutually different digits. 

Two modifications to this procedure will now be described, in the first, it is assumed that the controller, 
when operating a log-on procedure, is capable of resolving ambiguities in recognition - ie a situation where 
50 scores D1 and 02 are the same or differ by a small amount (say 5) - by offering the first choice back to the 
user (via the speech synthesizer' 3) and asking for confirmation, and, if this is not forthcoming, offering the 
second choice. (Such arrangements are the object of our pending UK patent application no. 2192746A). 

In order to allow in extreme cases the use of digits which are incorrectly recognised (but where the 
second choice is correct) during the allocation procedure, the criterion is modified as follows: 
55 (i) if neither the first nor second best score is incorrect, is given a zero rating; 

(ii) if the first is correct, the rating is defined as (D2-D1) + S (ie occupying a rating range from 5 upwards); 

(iii) if the second is correct, the rating is defined as 5-(02-Dl): a negative result is deemed to be zero (ie 
those cases where the difference is less than 5 occupy the rating range from 1 to 4). 
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The rating is then compared with the threshold value, in the same manner as previously. 

In the second modification, cognisance is taken of the fact that recognition may be improved by using 
different templates for male and female speakers. In this case, during the allocation process, speech 
recognition scores are obtained using both sets of templates. The controller determines which set gives the 
s best results (ie which set gives the largest (D1-D2) values or which set gives measures which are in general 
larger than the others) and carries out the remainder of the allocation process using the results from that set 
only. 

If the first digit of the allocated number is chosen to indicate whether the user is male or female, for 
example by allocating 1, 2 or 3 in the former case and 4, 5 or 7 in the latter, then the recogniser can be 
to arranged during the log-oh process to use, for the first digit, male templates for 1,2 and 3 and female 
templates for 4, 5 and 7. The set of templates to be used for recognition of the remaining few digits is then 
determined by the recognition result from the first digit. 

Claims 

J5 

1. An apparatus for operating a voice-operated service comprising a speech recogniser operable to 
compare each word of a sequence of words supplied thereto with stored templates to obtain a measure 
of similarity, the templates comprising a first set containing first templates corresponding to each word 
of a set of words, and a second set containing second templates corresponding to each word of the 

20 said set, and control means arranged in operation so to control the recogniser that: 

(a) the first word of the sequence is compared with templates of the first set of templates 
corresponding to certain predetermined words of the set and with templates of the second set of 
templates corresponding to other words of the set; 

(b) the measures are compared to determine which word of the set the said first word most closely 
25 resembles; and 

(c) subsequent words of the sequence are compared with either the first or the second set of 
templates, in dependence on which word the said first word has been determined as resembling. 

2. An apparatus according to Claim 1 in which the two sets of templates correspond to male and female 
30 speech respectively. 

3. An apparatus according to Claim 1 or 2 in which the service is an interactive voice-operated service 
accessible via a telephone line. 

35 4. An apparatus according to Claim 3 characterised in that the voice-operated service is a home banking 
service. 

5. An apparatus according to any one of Claims 1 - 4 wherein the set of words comprise the digits 0-9. 

40 6. A method of operating an automated voice-operated service on a telephone network, the method 
comprising determining the identity of a caller to this service through the use of a speech recogniser 
operable to compare each word of a sequence of words supplied thereto by the caller with stored 
templates to obtain a measure of similarity, the templates comprising a first set containing first 
templates corresponding to each word of a set of words, and a second set containing second templates 

45 corresponding to each word of the sard set, and control means arrange an operation so to control the 
recogniser that: (a) the first word of the sequence is compared with templates of the first set of 
templates corresponding to certain pre-determined words of the set and with templates of the second 
set of templates corresponding to other words of the set; (b) the measures are compared to determine 
which word of the set the said first word most closely resembles; (c) subsequent words of the 

50 sequence are compared with either the first or the second set of templates, in dependence on which 
word the said first word has been determined as resembling; the identity of the caller being determined 
in accordance with the sequence of words which are recognised. 

7. A method as claimed in Claim 6 in which the identity of the caller is subsequently checked by causing 
55 the caller to utter a personalised sequence of words, for example a personal identification. number, the 
uttered words being compared by a speech recogniser with previously stored templates specific to the 
caller identified as the result of the determination, the caller being given access to further aspects of 
the service only if the results of the speaker dependent recognition check of the personalised sequence 
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of words are acceptable. 

8. A method as claimed in Claim 6 or Claim 7 in which the two sets of templates correspond to male and 
female speech respectively. 

9. A^method as claimed in any one of Claims 6 - 8 wherein the service is a home banking service. 

10. A«method as claimed in any one of Claims 6 - 9 wherein the utterances of words uttered by the caller 
daring the determination step are stored for use in a subsequent speaker dependent recognition 

10 process. . 
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© In a telephone banking service, initial log-on 
(before the service has identified the user) must use 
speaker-independent recognition (ie the speech re- 
cognizer 2 operates in accordance with word tem- 
plates drawn from the population at large). The first 
digit of a sequence identifying the user is compared 
with certain templates of . two sets of templates to 
determine which set most closely resembles the 
user's speech. The two sets of templates may com- 
prise standard templates formed by male and female 
speakers respectively. 
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